17 research outputs found

    Moderate Responder Committees Maximize Fairness in (NxM)-Person Ultimatum Games

    Get PDF
    We introduce and study a multiplayer version of the classical Ultimatum Game in which a group of N Proposers jointly offers a division of resources to a group of M Responders. In general, the proposal is rejected if the (average) proposed offer is lower than the (average) response threshold in the Responders group. A motivation for our work is the exchange of flexibilities between different smart energy communities, where the surplus of one community can be offered to meet the demand of a second community. We find that, in the absence of any mechanism, the co-evolving populations of Proposers and Responders converge to a state in which proposals and acceptance thresholds are low, as predicted by the rational choice theory. This is more evident if the Proposers' groups are larger (i.e., large N). Low proposals imply an unfair exchange that highly favors the Proposers. To circumvent this drawback, we test different committee selection rules which determine how Responders should be selected to form decision-making groups, contingent on their declared acceptance thresholds. We find that selecting the lowest-demanding Responders maintains unfairness. However, less trivially, selecting the highest-demanding individuals also fails to resolve this imbalance and yields a worse outcome for all due to a high fraction of rejected proposals. Selecting moderate Responders optimizes overall fitness

    On rational delegations in liquid democracy

    Get PDF
    Liquid democracy is a proxy voting method where proxies are delegable. We propose and study a game-theoretic model of liquid democracy to address the following question: when is it rational for a voter to delegate her vote? We study the existence of pure-strategy Nash equilibria in this model, and how group accuracy is affected by them. We complement these theoretical results by means of agent-based simulations to study the effects of delegations on group's accuracy on variously structured social networks

    On rational delegations in liquid democracy

    Get PDF
    Liquid democracy is a proxy voting method where proxies are delegable. We propose and study a game-theoretic model of liquid democracy to address the following question: when is it rational for a voter to delegate her vote? We study the existence of pure-strategy Nash equilibria in this model, and how group accuracy is affected by them. We complement these theoretical results by means of agent-based simulations to study the effects of delegations on group's accuracy on variously structured social networks

    Stability of cooperation in societies of emotional and moody agents

    Get PDF
    It is well documented that cooperation may not be achieved in societies where self-interested agents are engaging in Prisoner’s Dil

    RLBOA: A modular reinforcement learning framework for autonomous negotiating agents

    Get PDF
    Negotiation is a complex problem, in which the variety of settings and opponents that may be encountered prohibits the use of a single predefined negotiation strategy. Hence the agent should be able to learn such a strategy autonomously. To this end we propose RLBOA, a modular framework that facilitates the creation of autonomous negotiation agents using reinforcement learning. The framework allows for the creation of agents that are capable of negotiating effectively in many different scenarios. To be able to cope with the large size of the state and action spaces and diversity of settings, we leverage the modular BOA-framework. This decouples the negotiation strategy into a Bidding strategy, an Opponent model and an Acceptance condition. Furthermore, we map the multidimensional contract space onto the utility axis which enables a compact and generic state and action description. We demonstrate the value of the RLBOA framework by implementing an agent that uses tabular Q-learning on the compressed state and action space to learn a bidding strategy.We show that the resulting agent is able to learn well-performing bidding strategies in a range of negotiation settings and is able to generalize across opponents and domains

    Lenient multi-agent deep reinforcement learning

    Get PDF
    Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated because agents update their policies in parallel [11]. In this work we apply leniency [23] to MA-DRL. Lenient agents map state-action pairs to decaying temperature values that control the amount of leniency applied towards negative policy updates that are sampled from the ERM. This introduces optimism in the value-function update, and has been shown to facilitate cooperation in tabular fully-cooperative multi-agent reinforcement learning problems. We evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN (HDQN) algorithm [22] as well as a modified version we call scheduled-HDQN, that uses average reward learning near terminal states. Evaluations take place in extended variations of the Coordinated Multi-Agent Object Transportation Problem (CMOTP) [8] which include fully-cooperative sub-tasks and stochastic rewards. We find that LDQN agents are more likely to converge to the optimal policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN agents

    Robust multi-agent Q-learning in cooperative games with adversaries

    Get PDF
    We present RoM-Q 1, a new Q-learning-like algorithm for finding policies robust to attacks in multi-agent systems (MAS). We consider a novel type of attack, where a team of adversaries, aware of the optimal multi-agent Q-value function, performs a worst-case selection of both the agents to attack and the actions to perform. Our motivation lies in real-world MAS where vulnerabilities of particular agents emerge due to their characteristics and robust policies need to be learned without requiring the simulation of attacks during training. In our simulations, where we train policies using RoMQ, Q-learning and minimax-Q and derive corresponding adversarial attacks, we observe that policies learned using RoM-Q are more robust, as they accrue the highest rewards against all considered adversarial attacks

    Learning on a Budget Using Distributional RL

    Get PDF
    Agents acting in real-world scenarios often have constraints such as finite budgets or daily job performance targets. While repeated (episodic) tasks can be solved with existing RL algorithms, methods need to be extended if the repetition depends on performance. Recent work has introduced a distributional perspective on reinforcement learning, providing a model of episodic returns. Inspired by these results we contribute the new budget- and risk-aware distributional reinforcement learning (BRAD-RL) algorithm that bootstraps from the C51 distributional output and then uses value iteration to estimate the value of starting an episode with a certain amount of budget. With this strategy we can make budget-wise action selection within each episode and maximize the return across episodes. Experiments in a grid-world domain highlight the benefits of our algorithm, maximizing discounted future returns when low cumulative performance may terminate repetition
    corecore